SIMD 0072: Feature Gate Threshold Automation #72

buffalojoec · 2023-10-12T13:22:01Z

This is SIMD 3/4 expected for Multi-Client Feature Gates. See #76

Summary

This SIMD outlines a proposal for automating the feature activation process
based on a stake-weighted support threshold, rather than manual human action.

With this new process, contributors no longer have to assess stake support for a
feature before activation. Instead, the assessment is done by the runtime.

A reference implementation for the Feature Gate program introduced in #89 and mentioned in this SIMD can be found here.

proposals/0066-feature-gate-threshold-automation.md

ripatel-fd

Thanks for submitting this SIMD - I'd like to ask about some clarification.

On first read, it wasn't immediately clear to me whether this change will install more human approval or less. Would it be possible to clarify in the summary whether:

Validator software will automatically signal all of its supported features when started, without further prompt from the operator.
Validator operators will now have to manually arm for feature gates via a validator CLI command.

The summary mentions that the process is automated and manual human action is removed, which would imply 1. But further below it mentions that the operator runbook now includes the new transaction step, which would imply 2.

FWIW, I oppose mechanism 1 because it'd make it virtually impossible to deploy a Labs-independent client safely. For various reasons, I expect Firedancer (validator software) to take a bit longer to implement new feature gates, and have a small minority of stake for the forseeable future. I'm worried that in mechanism 1, Labs nodes would just overrule other nodes all the time, and activate features that are not supported by all implementations.

CriesofCarrots · 2023-10-19T14:49:09Z

Validator software will automatically signal all of its supported features when started, without further prompt from the operator.

Validator operators will now have to manually arm for feature gates via a validator CLI command.

The summary mentions that the process is automated and manual human action is removed, which would imply 1. But further below it mentions that the operator runbook now includes the new transaction step, which would imply 2.

@ripatel-fd :
In this SIMD, we describe mechanism #1, but that is actually a client-specific design. Other clients or forks of clients could remove that automatic behavior and have validators signal feature support another way or on another cadence.

However, the more fundamental principle of this proposal is that actual activation of the feature is automatic based on percentage of stake support, default 95%. While the Labs client no longer represents anywhere near 95% of stake, it's true that no one entity or client implementation (including Firedancer) would be able to block activation en masse until they represent 5+% of stake.

Even if this SIMD is approved and implemented, feature-gate key-holders can continue to negotiate with all the clients to determine when to queue the feature for activation, as we do now -- and we envision separate SIMDs around formal governance on approving/queuing proposed features. So this SIMD wouldn't prevent the level of human control that currently exists.
But if we should use something other than stake-weight to determine how many validators are compatible with a particular feature, this is the place to discuss it.

lheeger-jump

Overall, this proposal is a much needed one. As I have mentioned though, in my comments, I think we need a separate SIMD for the new program and everything about how it will work.

I also think some consideration as to how this can tie into a governance-based mechanism would be good. For example, governance could approve the creation of new features, the description of such a vote would specify exactly what the new feature will do.

Good stuff so far!

proposals/0072-feature-gate-threshold-automation.md

lheeger-jump · 2023-10-19T20:07:27Z

We (@ripatel-fd and I) spent some more time discussing this proposal and we had several more comments on it:

The proposed changes implicitly hard fork on client software updates. There is a significant issue with the implicit activation of features on hard forks. This delegates the power to change consensus rules entirely to Solana Labs, which contradicts the expectation that the blockchain is governed by validators, stakers, holders, etc. Although it is a conscious choice for validators to upgrade their software, regular updates to software are a forcing function causing the activation of features. Similarly, a security fix shipped with a new features will certainly cause majority of stake to take the new update, even if they do not want the new feature activation.
This poses a particularly hard problem for adoption of Firedancer: If we start to deploy to mainnet beta, we will most definitely start out with less than 5% of stake. This would that in order to even have a chance of being stable on mainnet we would need more than 5% of stake. Its a major bootstrapping problem.
We need to see a specification for the structure of the new feature accounts.

We propose that there is 3 phase system for feature gate ratification:

Creation: anyone can create a new feature gate account, which is owned by the feature program. Creating a new feature gate must not have any additional runtime overhead.
Version availability: Each leader at the end of each slot, increments a counter in the feature accounts for all features they support, but are not activated. They also update a field in the feature account with the slot they are currently on. If this "last_updated_slot" is from the previous epoch during this process, store the previous counter and last updated field in the account in "previous epoch fields" and set the counter to 0. The feature becomes "available" or "armed for activation" when: 1) that previous epoch counter is very nearly 100% of slots_per_epoch and, 2) the previous epoch last_updated_slot field is in the immediately previous epoch. A feature is disarmed when the conditions are not met.
Activation: This should be an explicit, manual, on chain governance vote based on validator stake weight with some quorum. The properties of this vote (its duration, quorum, delay to activation after passage, participants) are to be determined but a majority vote for the new feature must result in the automated activation of the feature at an epoch boundary. This gives all stakeholders time to vie for, or protest the new feature. The feature must remain armed until the completion of the vote, if it becomes disarmed and then later rearmed, the vote should probably restart. An on-chain program should be devised for this process prior to the feature gate change.
After activation, a feature becomes active forever and leaders can stop updating the feature account. Clients can remove the gates in the code.

We would be happy to collaborate on proposal(s)/specification(s) which go in the above direction. As it stands, it introduces the aforementioned bootstrapping issue, which would give no recourse for the community to dissent to new features. This could prolong the development of clients with less resources.

More importantly than our own concerns though (which are admittedly self-interested) this proposal would further Solana Labs' hegemony on the chain via their control of client features. This may be against the wants of validators who actually control the blockspace of Solana. It should not be the version of the Solana Labs client or the clients implicit capabilities which control the blockspace, it must be the people.

CriesofCarrots · 2023-10-19T20:49:08Z

We propose that there is 3 phase system for feature gate ratification:

@lheeger-jump , thanks for your thoughts. To clarify, the system we envision is 2 phases:

Feature queuing: currently done manually by a single person, but to transition to an explicit governance vote. This creates the feature account (but this could be split into two steps -- account creation by "anyone" and some update of account state by the governance signer to indicate queuing -- if this division is needed).
Version availability: the proposal in this SIMD

To me, these two steps are synonymous to the Version Availability and Activation steps you describe, with the same essential stake and participation requirements, just in the opposite order. Can you please expand on how changing the order of these steps changes the amount of control any particular client wields?

ripatel-fd · 2023-10-19T23:44:15Z

Thanks for clarifying @CriesofCarrots. I should have read the SIMD more carefully. I think we are ideologically aligned in that this is only a "feature support discovery" mechanism, and that the actual activation logic can be specified in a separate PR. This is definitely a step in the right direction. The above criticism regarding risk to another clients that me and @lheeger-jump posted is therefore invalid.

We had some thoughts about how the permissioned part can be removed (adding new features to the queue).
These changes would also allow keeping the logic entirely as eBPF, such that zero protocol-level complexity is introduced.

It would involve changing the negotiation API like so:

Feature support advertisement transactions are only posted by leaders in every rotation
Support is accumulated in a variable-size window that is a multiple in an epoch
During each window, each feature account tracks the number of leader rotations during which support for this feature has been advertised
Instead of writing a bitmap, explicitly call a feature program instruction and write to the corresponding feature account
Feature gate support can be calculated by either advertised_rotation_cnt*4/slots_per_epoch (share of total stake) or advertised_rotation_cnt*4/blocks_produced_in_epoch (share of active stake)
A feature can be added to the queue by the feature account authority

Example API

enum FeatureInstruction {
    /// Initializes a feature account
    /// # Account references
    ///   0. `[writable]` The feature account
    ///   1. `[signer]` The feature authority
    Initialize,

    /// Closes a feature account without nominating it
    /// # Account references
    ///   0. `[writable]` The feature account
    ///   1. `[signer]` The feature authority
    ClosesFeature,

    /// Advertise support for feature. Can only be called by current leader, once per rotation.
    /// # Account references
    ///   0. `[writable]` The feature account
    ///   1. `[signer]` Current leader
    LeaderAdvertise,

    /// Nominates a feature for a governance vote.
    /// # Account references
    ///   0. `[writable]` The feature account
    ///   1. `[signer]` The feature authority
    Nominate,
}

The disadvantage of this change is that it increases the amount of transactions and data required to activate a feature.
(Also relevant comment by @CriesofCarrots: #72 (comment))

lheeger-jump · 2023-10-20T13:17:05Z

@lheeger-jump , thanks for your thoughts. To clarify, the system we envision is 2 phases:

Feature queuing: currently done manually by a single person, but to transition to an explicit governance vote. This creates the feature account (but this could be split into two steps -- account creation by "anyone" and some update of account state by the governance signer to indicate queuing -- if this division is needed).

Version availability: the proposal in this SIMD

To me, these two steps are synonymous to the Version Availability and Activation steps you describe, with the same essential stake and participation requirements, just in the opposite order. Can you please expand on how changing the order of these steps changes the amount of control any particular client wields?

This should be more clear in the SIMD and should be very explicit. Thanks for the clarification. Nonetheless, the arming mechanism described is relevant to the discussion.

proposals/0072-feature-gate-threshold-automation.md

lheeger-jump · 2023-10-20T15:44:46Z

To me, these two steps are synonymous to the Version Availability and Activation steps you describe, with the same essential stake and participation requirements, just in the opposite order. Can you please expand on how changing the order of these steps changes the amount of control any particular client wields?

The 3 steps I described are to allow anyone to create a new feature, but not to allow them to be able to be the activator (that would be governance). I think what I am saying is that anyone can do creation (i.e. anyone can propose new features) and that is separate from activation which should not be voted on. I think you basically say that here:

(but this could be split into two steps -- account creation by "anyone" and some update of account state by the governance signer to indicate queuing -- if this division is needed).

Can you please expand on how changing the order of these steps changes the amount of control any particular client wields?

By splitting up the first step into creation and then activation after version availability is satisfied, you disallow the feature owner from ramming the feature on and the choice to enable is explicit. Less about client control. The client control is more about a lack of governance on feature activation.

buffalojoec · 2024-02-05T21:52:47Z

Hey everyone! It's time to dust this one off again now that we have #88 and #89 merged!

We've introduced a completely revised version of this proposal, re-written from scratch. It should assess all of the discussion points raised above in previous threads, as well as numerous concerns raised in various offline discussions.

I'd love it if we can open back up the review process for this one, and start working toward a new feature activation process!

@CriesofCarrots @lheeger-jump @ripatel-fd

P.S. I've included a link to the Feature Gate program's reference implementation in the PR description.

lheeger-jump · 2024-02-21T19:02:09Z

@ripatel-fd and I can take another look in the next few days

topointon-jump

I like the general approach!

proposals/0072-feature-gate-threshold-automation.md

topointon-jump

Overall looks good, added some clarifying comments/questions, the answers to which would be good to have in the SIMD text.

Basically trying to be explicit about all assumptions/details, so that a client can implement this SIMD using just this SIMD.

Nice one!

proposals/0072-feature-gate-threshold-automation.md

CriesofCarrots

One substantive question, otherwise mostly suggestions for clarity.

proposals/0072-feature-gate-threshold-automation.md

CriesofCarrots · 2024-05-21T20:38:31Z

proposals/0072-feature-gate-threshold-automation.md

+Transactions sent too late (< 4500 slots before the epoch end) will be rejected
+by the Feature Gate program.


Why do we need any deadline/restriction? I see the program cache being alluded to, but I don't see any specification that changes to the Staged Features PDA must begin or conclude at some point earlier than the epoch boundary. It's not obvious why the Feature Gate program can't continue to modify the Staged Features PDA up to the epoch boundary.

Are you suggesting to explicitly explain the reasoning for this constraint in the SIMD, or are you asking why we actually need such a constraint? In either case, I agree it should be more explicit in the proposal.

It seems to me that @Lichtso intends to increase the cache preparation phase's initiation to be sooner in the epoch (comment), so we probably need a more concrete explanation as to why the Feature Gate program should have this constraint.

In this proposal, it was initially set to 128 slots remaining in the epoch because the cache preparation stage begins at a maximum of 128 slots before the end of the epoch, as computed here.

It's worth calling out that if we configure this constraint in the Feature Gate program and then later need to adjust it for an earlier cache preparation start, we'll be bound to using a feature gate for that change to the program.

I'm asking why we need a constraint at all. The cache being referenced is supposedly an agave-specific implementation, not part of the runtime protocol. If feature activation needs to work around any aspect of cache behavior, that seems like (a) a problem; and (b) a thing that definitely needs to be specified for all clients.

There's really nothing stopping the cache from being able to anticipate the upcoming feature environment by querying the PDA under the Feature Gate program that would tell it what the current stake support is for each feature. It's actually arguably equally or more deterministic than the way it's done now.

The cache could run the calculation at its current 128 slots before epoch boundary and probably get a pretty good understanding of which features have enough stake support. Increasing that slot buffer beyond 128 could just mean increasing the likelihood that not enough stake has been signaled for a feature. Perhaps this is a compromise the cache implementation should consider? After all, I believe recompilation is an optimization.

I've added a commit removing the "slots remaining" constraint, since I think the above is a sensible direction to head in. If there's explicit opposition, it can be added back in.

Sorry for my imprecise formulation, I meant scheduled for activation.
If a feature is scheduled for activation in some slot close to the boundary, then it is possible that the TX is not yet rooted. Thus, some validator nodes will not have it scheduled, and will not activate it on the boundary.

Please explain how this is different from the cluster coming to consensus on any other tx/block on which other logic depends.

For features which only affect their own fork this is irrelevant. But for features which change validator global behavior this would break. And I am not sure if there is much to be gained from making exceptions or harving different feature types.

Just adding an additional point of clarity here, if it helps.

Right now, in Agave, the cache is calling compute_active_feature_set, which will compute the feature set based on the on-chain accounts that have been created under Feature11111... (ie. "activated"). The cache does this ~128 slots before the epoch boundary to prepare the cache entries for the anticipated upcoming environment.

With the proposed system in place, the cache can still call compute_active_feature_set the same way it does now. The function will just compute the anticipated feature set from the proposed PDA (with stake support) rather than just the presence of on-chain accounts.

In almost every way it's the same as what we have now. The subtle difference is that someone can no longer activate a feature after the cache has done its feature-set computation, but nodes could still cast votes (signals) on a feature, bringing it to "eligible" status. This edge case exists today and just causes cache preparation/recompilation to be moot.

What @CriesofCarrots is suggesting is that we don't explicitly architect this mechanism around the cache, which is sensible, especially considering the cache preparation may change.

@Lichtso do you still have concerns here? I'd like to move this SIMD along.

proposals/0072-feature-gate-threshold-automation.md

CriesofCarrots

A couple small comments on the new PDA wording. Just the question about "slots remaining" constraint to be dealt with, as far as I'm concerned.

proposals/0072-feature-gate-threshold-automation.md

Co-authored-by: Tyera <[email protected]>

topointon-jump · 2024-12-06T22:48:12Z

proposals/0072-feature-gate-threshold-automation.md

+
+The `SignalSupportForStagedFeatures` instruction is structured as follows:
+
+- Data: A `u8` bit mask of the staged features.


Does this also need to include the epoch? As the list of staged features changes epoch-to-epoch, we need a way of ensuring that the bit mask in each validator support signal transaction is interpreted for the correct epoch. If the transaction is delayed or is executed in a different epoch than the one it was sent, for whatever reason, then the support signal will be mis-interpreted.

I would guess the Staged Features PDA supplied should indicate the intended epoch?

It's in the PDA for the validator's support signal. When they send a bitmask, it gets recorded in their signal PDA, and if they send again, it's overwritten. The current epoch is checked via Clock sysvar when the program is invoked with this instruction.

buffalojoec changed the title ~~feature-gate-threshold-automation init~~ 0066: Feature Gate Threshold Automation Oct 12, 2023

jacobcreech reviewed Oct 18, 2023

View reviewed changes

proposals/0066-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

mvines reviewed Oct 19, 2023

View reviewed changes

buffalojoec changed the title ~~0066: Feature Gate Threshold Automation~~ 0072: Feature Gate Threshold Automation Oct 19, 2023

ripatel-fd reviewed Oct 19, 2023

View reviewed changes

proposals/0066-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

ripatel-fd reviewed Oct 19, 2023

View reviewed changes

lheeger-jump reviewed Oct 19, 2023

View reviewed changes

lheeger-jump reviewed Oct 20, 2023

View reviewed changes

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

lheeger-jump reviewed Oct 20, 2023

View reviewed changes

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

buffalojoec mentioned this pull request Oct 26, 2023

Roadmap: Multi-Client Feature Gates #76

Open

6 tasks

buffalojoec changed the title ~~0072: Feature Gate Threshold Automation~~ SIMD 0072: Feature Gate Threshold Automation Oct 27, 2023

buffalojoec mentioned this pull request Oct 27, 2023

Bank: Add function to replace empty account with upgradeable program on feature activation solana-labs/solana#32783

Merged

buffalojoec mentioned this pull request Nov 9, 2023

SIMD Roadmap: Multi-Client Feature Gates buffalojoec/solana-improvement-documents#3

Closed

buffalojoec force-pushed the simd-feature-gate-threshold-automation branch from 546b3ab to d1817fa Compare January 26, 2024 03:49

buffalojoec force-pushed the simd-feature-gate-threshold-automation branch from ca3d674 to 2497aee Compare February 5, 2024 21:51

topointon-jump reviewed Feb 26, 2024

View reviewed changes

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

buffalojoec mentioned this pull request Mar 26, 2024

SIMD-0133: Syscall: Get Epoch Stake #133

Merged

topointon-jump reviewed May 11, 2024

View reviewed changes

buffalojoec force-pushed the simd-feature-gate-threshold-automation branch from a643bdf to 177b45a Compare May 13, 2024 14:31

topointon-jump approved these changes May 15, 2024

View reviewed changes

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

CriesofCarrots reviewed May 21, 2024

View reviewed changes

feature-gate-threshold-automation init

013d0e7

buffalojoec and others added 12 commits May 22, 2024 10:39

add state layout examples

d705e54

add multi-sig activation gate

a63abbb

add additional elaboration

7a6c3fc

init new version of 0072

23f1b24

add reusable support signal PDA

f830338

update to use validator epoch stake syscall

2fb901b

add updates from SIMD 0133

ab95740

add some context and wording

1a8f21a

add c struct for PDA layout

ea129ba

hard-coded threshold

bb1c797

change deadline to 4500 slots

b85755d

clarity suggestions

66e6371

buffalojoec force-pushed the simd-feature-gate-threshold-automation branch from 4d2b500 to 5ca6da6 Compare May 22, 2024 18:32

add more program specification

58cb436

buffalojoec force-pushed the simd-feature-gate-threshold-automation branch from 5ca6da6 to 58cb436 Compare May 22, 2024 18:38

CriesofCarrots reviewed May 23, 2024

View reviewed changes

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

buffalojoec added 2 commits June 3, 2024 14:03

one PDA per epoch

da132f7

remove slots remaining constraint

4de5abd

CriesofCarrots reviewed Jun 3, 2024

View reviewed changes

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

proposals/0072-feature-gate-threshold-automation.md Outdated Show resolved Hide resolved

buffalojoec and others added 5 commits June 3, 2024 14:20

Update proposals/0072-feature-gate-threshold-automation.md

072344f

Co-authored-by: Tyera <[email protected]>

instruction clarity

9296061

update StageFeatureForActivation instruction

458b3ac

update SignalSupportForStagedFeatures instruction

d9f399d

update runtime step

b87cadb

topointon-jump reviewed Dec 6, 2024

View reviewed changes

topointon-jump approved these changes Dec 7, 2024

View reviewed changes

buffalojoec added 2 commits December 9, 2024 18:44

update state init requirement

6e74a91

mark as idea

4591f55

topointon-jump approved these changes Dec 9, 2024

View reviewed changes

CriesofCarrots approved these changes Dec 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD 0072: Feature Gate Threshold Automation #72

SIMD 0072: Feature Gate Threshold Automation #72

buffalojoec commented Oct 12, 2023 •

edited

Loading

ripatel-fd left a comment

CriesofCarrots commented Oct 19, 2023

lheeger-jump left a comment

lheeger-jump commented Oct 19, 2023 •

edited

Loading

CriesofCarrots commented Oct 19, 2023

ripatel-fd commented Oct 19, 2023 •

edited

Loading

lheeger-jump commented Oct 20, 2023

lheeger-jump commented Oct 20, 2023

buffalojoec commented Feb 5, 2024 •

edited

Loading

lheeger-jump commented Feb 21, 2024

topointon-jump left a comment •

edited

Loading

topointon-jump left a comment •

edited

Loading

CriesofCarrots left a comment

CriesofCarrots May 21, 2024

buffalojoec May 23, 2024

CriesofCarrots May 23, 2024

buffalojoec May 24, 2024 •

edited

Loading

buffalojoec Jun 3, 2024

Lichtso Jun 3, 2024

CriesofCarrots Jun 3, 2024 •

edited

Loading

Lichtso Jun 4, 2024

buffalojoec Jun 4, 2024

buffalojoec Aug 21, 2024

CriesofCarrots left a comment

topointon-jump Dec 6, 2024

ravyu-jump Dec 6, 2024

buffalojoec Dec 9, 2024

		Transactions sent too late (< 4500 slots before the epoch end) will be rejected
		by the Feature Gate program.


		The `SignalSupportForStagedFeatures` instruction is structured as follows:

		- Data: A `u8` bit mask of the staged features.

SIMD 0072: Feature Gate Threshold Automation #72

Are you sure you want to change the base?

SIMD 0072: Feature Gate Threshold Automation #72

Conversation

buffalojoec commented Oct 12, 2023 • edited Loading

Summary

ripatel-fd left a comment

Choose a reason for hiding this comment

CriesofCarrots commented Oct 19, 2023

lheeger-jump left a comment

Choose a reason for hiding this comment

lheeger-jump commented Oct 19, 2023 • edited Loading

CriesofCarrots commented Oct 19, 2023

ripatel-fd commented Oct 19, 2023 • edited Loading

lheeger-jump commented Oct 20, 2023

lheeger-jump commented Oct 20, 2023

buffalojoec commented Feb 5, 2024 • edited Loading

lheeger-jump commented Feb 21, 2024

topointon-jump left a comment • edited Loading

Choose a reason for hiding this comment

topointon-jump left a comment • edited Loading

Choose a reason for hiding this comment

CriesofCarrots left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

buffalojoec May 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CriesofCarrots Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CriesofCarrots left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

buffalojoec commented Oct 12, 2023 •

edited

Loading

lheeger-jump commented Oct 19, 2023 •

edited

Loading

ripatel-fd commented Oct 19, 2023 •

edited

Loading

buffalojoec commented Feb 5, 2024 •

edited

Loading

topointon-jump left a comment •

edited

Loading

topointon-jump left a comment •

edited

Loading

buffalojoec May 24, 2024 •

edited

Loading

CriesofCarrots Jun 3, 2024 •

edited

Loading